Systematic Biology
◐ Oxford University Press (OUP)
Preprints posted in the last 90 days, ranked by how well they match Systematic Biology's content profile, based on 121 papers previously published here. The average preprint has a 0.03% match score for this journal, so anything above that is already an above-average fit.
Berling, L.; Colijn, C.
Show abstract
Bayesian phylogenetic inference produces large samples from a posterior distribution over phylogenetic trees that represents uncertainty in both tree topology and associated variables. Such a collection of trees is hard to interpret and it is common practice to summarize such samples into a single representative tree. Methods for constructing representative trees have largely been restricted to plain tree topologies, encoding only relationships among taxa. Inference with more sophisticated models produce annotated tree objects. These have additional information representing nodes locations in the case of phylogeography, host information when inferring transmission trees, or sampled ancestor status when incorporating fossil information. Nevertheless, these annotated representations are reduced to a single representative tree, typically using methods developed for plain tree topologies and without accounting for the resulting methodological mismatch. Here, we introduce the concept of an extended clade and investigate an extension of the conditional clade distribution (CCD) model. Through motivating examples and case studies in discrete trait phylogeography and transmission tree reconstruction, we demonstrate limitations of standard summary tree approaches and show how these can be addressed using an extended CCD framework that explicitly incorporates the annotated tree structure.
Milkey, A.; Chen, J.; Lewis, P. O.
Show abstract
AO_SCPLOWBSTRACTC_SCPLOWAs modern phylogenomics datasets become increasingly large, it is useful to develop recommendations for how to subsample datasets for best species tree inference. Here we apply a new measure of phylogenetic information content that estimates the reduction in tree space occupied by a posterior sample of inferred trees relative to a prior sample in order to assess the effects of gene tree parameters on species tree estimation. We find that, consistent with earlier studies, when data are informative, more data result in better species tree inference. However, when data are uninformative, subsampling a dataset to include only the most informative loci may produce a better species tree sample. We perform analyses on a variety of simulated and empirical datasets.
Ren, H.; Jiang, C.; Wong, T. K. F.; Shao, Y.; Susko, E.; Minh, B. Q.; Lanfear, R.
Show abstract
Partitioned and mixture models are widely employed in Maximum Likelihood phylogenetic analyses of large genomic datasets. Comparing the fit of the two types of models has been challenging, because standard information-theoretic approaches cannot be applied. Mixture models are increasingly popular for the analysis of amino acid datasets and can lead to different conclusions compared to partitioned models. This raises an important question - which type of model tends to perform better? Susko et al. (2026) recently introduced the marginal Akaike information criterion (mAIC), which allows mixture models and partitioned models to be directly compared for the first time. Here, we use the mAIC and a range of other approaches to compare the fit of mixture and partitioned models across a diverse set of empirical datasets. We show that mixture models are universally favoured on amino acid datasets. This has important implications for interpreting empirical analyses and suggests that continued development of mixture models is an important avenue for future research.
Nagel, A. A.; Landis, M. J.
Show abstract
Ancestral state reconstruction is a classical problem of broad relevance in phylogenetics. Likelihood-based methods for reconstructing ancestral states under discrete character models, such as Markov models, have proven extremely useful, but only work so long as the assumed model yields a tractable likelihood function. Unfortunately, extending a simple but tractable phylogenetic model to possess new, but biologically realistic, properties often results in an intractable likelihood, preventing its use in standard modeling tasks, including ancestral state reconstruction. The rapid advancement of deep learning offers a potential alternative to likelihood-based inference of ancestral states, particularly for models with intractable likelihoods. In this study, we modify the phylogenetic deep learning software O_SCPLOWPHYDDLEC_SCPLOW to conduct ancestral state reconstruction. We evaluate O_SCPLOWPHYDDLEC_SCPLOWs performance under various methodological and modeling conditions, while comparing to Bayesian inference when possible. For simple models and small trees, its performance resembles the performance of Bayesian inference, but worsens as tree size increases. While O_SCPLOWPHYDDLEC_SCPLOW still performs adequately for more complex models, such as speciation and extinction models, the estimates differ more from Bayesian inference in comparison with simpler models. Lastly, we use O_SCPLOWPHYDDLEC_SCPLOW to infer ancestral states for two empirical datasets, one of the ancestral ranges of a subclade of the genus Liolaemus and ancestral locations for sequences from the 2014 Sierra Leone Ebola virus disease outbreak.
Li, B.; Ane, C.
Show abstract
Phylogenetic network inference methods are increasingly used to detect hybridization and gene flow from genomic data, but their robustness to common sources of model violation remains poorly characterized. We conducted a simulation study to evaluate the effects of hidden paralogy and substitution rate variation on two widely used network inference methods: find_graphs from ADMIXTOOLS 2 and SNaQ. Using an eight-taxon species tree calibrated from an empirical reptile phylogeny, we simulated data under various levels of hidden paralogy (from none to strong) and three levels of rate variation (none, gene-specific, and lineage-specific). We found that hidden paralogy had limited impact on network inference under the conditions examined: both network methods correctly favored a tree without reticulation, and ASTRAL recovered the correct species tree every time. In contrast, lineage-specific rates severely biased find_graphs, inflating worst f-statistic residuals well beyond the standard acceptance threshold. SNaQ correctly selected a tree model almost always across all conditions, though its network with h = 1 reticulation displayed the true species tree with a lower probability under lineage-specific rates. We also show that the standard worst residuals threshold of 3 for find_graphs produces inflated type I error even without rate variation, and we recommend empirical calibration of this threshold within each study system.
Revell, L. J.; Alencar, L. R. V.; Alfaro, M. E.; Dain, J.; Hill, N. J.; Jones, M.; Martinet, K. M.; Romero-Alarcon, V.; Harmon, L. J.
Show abstract
The practical utility of many modern phylogenetic comparative methods can depend on how accurately mathematical models capture the evolutionary process of traits. Boucher and Demery (2016) described a new quantitative trait model, Brownian motion with reflective limits, that they anticipated might be of use in testing hypotheses about a particular sort of constraint on phenotypic character evolution. Since their analytic solution for the probability function under this bounded evolutionary scenario was not practical to evaluate for reasonably-sized trees, Boucher and Demery (2016) also identified a creative technique for computing the likelihood of their model. The basis of this methodology derives from the convergence of an equal-rates, symmetric, ordered Markov chain and continuous stochastic diffusion in the limit as the number of steps in our chain goes to {infty} (or, alternatively, as their widths decrease towards zero). We refer to this convergence in the limit as the discretized diffusion approximation or (more compactly) the discrete approximation. We realized that this discrete approximation of Boucher and Demery (2016) unlocked a number of additional models for the phylogenetic comparative analysis of discrete and continuous trait data, and we explore several of these in the present article. Specifically, we examine application of this discretized diffusion approximation to the threshold model from evolutionary quantitative genetics, to a new "semi-threshold" trait evolution model, to a joint model of discrete and continuous traits in which the discrete trait influences the rate of evolution of our continuous character, as well as a model where precisely the converse is true, and to a discrete character dependent multi-trend trended continuous trait evolution model. We conclude with some context for the origins of our article and discussion of other possible applications of this powerful approach.
Milkey, A.; Lewis, P. O.
Show abstract
AO_SCPLOWBSTRACTC_SCPLOWA new Bayesian measure of phylogenetic information content is introduced based on geodesic distances in treespace. The measure is based on the relative variance of phylogenetic trees sampled from the posterior distribution compared to the prior distribution. This ratio is expected to equal 1 if there is no information in the data about phylogeny and 0 if there is complete information. Trees can be scaled to have the same mean tree length to avoid dominance by edge length information and focus on topological information. The method scales well, requiring only that a valid sample can be obtained from both prior and posterior distributions. We show how dissonance (information conflict) among data sets can also be estimated. Both simulated and empirical examples are provided to illustrate that the new approach produces sensible and intuitive results.
Duke, J. D.; Guo, J.; Forest, F.; Gumbs, R.; McTavish, E. J.; Rosindell, J.
Show abstract
Time-scaled phylogenetic trees summarising evolutionary relationships are fundamental to many analyses in biology, from diversification rate estimation to conservation prioritisation. The most comprehensive available summary of these relationships, the Open Tree of Life, synthesises information from over two thousand studies into a supertree covering the full range of global biodiversity, but its use in downstream analyses is limited by the lack of divergence times. Previous work has mapped dates from Open Tree's database of trees to certain nodes in the supertree, but for the majority of nodes no date is available. While algorithms exist to interpolate missing dates in a tree, we found that their time and memory requirements scaled quadratically with the number of nodes, which made it computationally infeasible to run them on the entire tree. In this work, we describe novel date interpolation algorithms that scale linearly with the number of nodes. These enabled us to produce a distribution of fully-dated trees containing 2.3 million extant described species, greatly expanding the scope of feasible phylogenetic analyses. We illustrate the utility of these trees by computing the most robust estimate yet of the phylogenetic diversity of the complete tree of life, incorporating both topological and temporal uncertainty.
Ane, C.; Bastide, P.
Show abstract
Most phylogenetic comparative methods use a species-level phylogeny, ignoring the effect of incomplete lineage sorting (ILS) and hemiplasy on the traits of interest. We consider here a trait controlled additively by one or more unknown loci. Their gene trees may differ from the species phylogeny due to ILS, as modeled by the coalescent process. If the species phylogeny is a network, this process also accounts for gene flow, admixture or hybridization. Our model allows for polymorphism in the ancestral population at the root of the species phylogeny, and predicts heritable within-population variation due to ILS. Even if each locus evolves according to a Brownian motion, the joint distribution of all trait measurements is not generally Gaussian due to ILS. We provide a Gaussian approximation, named the Gaussian Coalescent, and show how to compute its variance matrix efficiently using a single traversal of the species phylogeny. In simulations, this model is much more accurate than the model ignoring ILS. In simulations and on a data set of tomato floral traits, it is favored over the standard Brownian motion model with extra within-population variance. The GC model opens new avenues for various phylogenetic comparative methods, accounting for hemiplasy and gene flow simultaneously. It is implemented in phylolm v2.7.0 and in PhyloTraits v1.2.0.
King, B.
Show abstract
Simulation-based calibration (SBC) checking is a method to ensure that the inference machinery for a Bayesian statistical analysis is functioning in a correct and unbiased manner. Typically, SBC begins with sampling parameter values from the model priors (prior SBC). However, it has been shown that prior SBC can miss problems when these manifest only in certain regions of parameter space. In phylogenetics, this is relevant not only because of the vastness of tree and parameter space, but also because many phylogenetic analyses involve some degree of model misspecification. Posterior SBC is a recently developed method for checking that the inference algorithms function correctly for a given empirical dataset. Here I use posterior SBC to test the implementation of phylogenetic dating methods in the inference software BEAST 2. I test both the tip-dated approach, employing an Indo-European vocabulary dataset, and the node-dated approach, employing a molecular rRNA dataset of Tabanidae (horseflies). In both cases, posterior SBC tests indicate good calibration. Despite this, posterior predictive datasets simulated from the posterior distribution provided no further increase in the precision of node age estimates compared to the original posterior, a result consistent with previous literature showing fundamental theoretical limits to the identifiability of node ages. Nevertheless, these results suggest that phylogenetic dating methods in BEAST 2 are not biased by problems with the inference machinery, thereby increasing confidence in results obtained using these methods.
De Maio, N.
Show abstract
Maximum likelihood phylogenetic methods are popular approaches for estimating evolutionary histories. These methods do not assume prior hypotheses regarding the shape of the phylogenetic tree, and this lack of prior assumptions can be useful in particular in case of idiosyncratic sampling patterns. For example, the rate at which species are sequenced can differ widely between lineages, with lineages more of interest to humans being usually sequenced more often than others. However, in some settings sampling can be lineage-agnostic. In genomic epidemiology, for example, the sequencing rate can change through time or across locations, but is often agnostic to the specific pathogen strain being sequenced. In this scenario, one expects that the abundance of a pathogen strain at a specific time and location in the host population is reflected in the relative abundance of that strain among the genomes sequenced at that time and location. Here, I show that this simple assumption, when appropriate and incorporated within maximum likelihood phylogenetics, can greatly improve the accuracy of phylogenetic inference. This is similar to the famous medical principle "when you hear hoofbeats, think of horses, not zebras". In our application this means that, when for example observing a (possibly incomplete) genome sequence that has a similar likelihood of belonging to multiple different strains, I aim to prioritize phylogenetic placement onto a common strain (the "horse", a common disease) rather than a rare one (the "zebra", a rare disease). I introduce and assess two separate approaches to achieve this. The first approach rescales the likelihood of a phylogenetic tree by the number of distinct binary topologies obtainable by arbitrarily resolving multifurcations in the tree. This approach is based on a new interpretation of multifurcating phylogenetic trees particularly relevant at low divergence: multifurcations represent a lack of signal for resolving the bifurcating topology rather than an instantaneous multifurcating event, and so a multifurcating tree is interpreted as the set of bifurcating trees consistent with the multifurcating one, rather than as a single multifurcating topology. The second approach instead includes a tree prior that assumes that genomes are sequenced at a rate proportional to their abundance. Both approaches favor phylogenetic placement at abundant lineages, and using simulations I show that both methods dramatically improve the accuracy of phylogenetic inference in scenarios like SARS-CoV-2 phylogenetics, where large multifurcations are common. This considerable impact is also observed in real pandemic-scale SARS-CoV-2 genome data, where accounting for lineage prevalence reduces phylogenetic uncertainty by around one order of magnitude. Both approaches were implemented as part of the free and open source phylogenetic software MAPLE v0.7.5.4 (https://github.com/NicolaDM/MAPLE).
Hipp, A. L.; Althaus, K. N.; Fuller, E. L.; Hahn, M.; Larson, D. A.; Mohn, R. A.; Wang, B.; Manos, P. S.
Show abstract
Forest trees pose numerous potential challenges to phylogenomic inference. Their large effective population sizes and relatively long generation times lead to deep allele coalescence and consequently incomplete lineage sorting (ILS), which biases inferences of divergence times toward older ages and introduces gene tree discordance. Deep phylogenetic divergences, reaching back into the Paleocene, introduce reference-mapping biases. Introgression--the movement of genes between lineages--may result in different phylogenies being inferred depending on which individuals are included in analysis, even if the plurality of the genome favors the divergence history unaffected by introgression. These factors influence phylogenetic inference across the Tree of Life but are particularly prevalent in forest trees. Oaks (Quercus) are notable for all three influences. In addition, our knowledge of the oak phylogeny is currently based strongly on restriction site associated DNA sequencing (RADseq) datasets published over the past decade, which may introduce additional sources of uncertainty. In this chapter, we analyze a 322-species RADseq dataset and genome resequencing data from across the genus to address sources of uncertainty in our understanding of the global oak phylogeny, which we hope will serve as a model for other research groups working on comparable woody plant groups.
Scutt, C. N.; Cooper, N.; Thomas, G. H.; Guillerme, T.
Show abstract
Morphological trait datasets and phylogenies are routinely paired to investigate macroevolutionary patterns during disparity analyses. However, incomplete fossil sampling can distort disparity estimates, obscuring true evolutionary signals. Ancestral state estimation can be used for both continuous and discrete traits to extend these analyses beyond incomplete fossil data, such as investigations into disparity through time. However, when ancestral state estimation occurs in the disparity pipeline, and the inevitable uncertainty in these estimates, complicate their integration. Determining the most robust workflow for integrating ancestral state estimation in disparity analyses remains a critical methodological challenge. Using simulations to attain a ground-truth disparity value, we evaluated different approaches to performing ancestral state estimation and incorporating uncertainty across varying continuous and discrete trait models, fossil sampling densities and disparity metrics. Ancestral state estimation generally improved recovery of true disparity relative to tip-only analyses, though the optimal approach depended on the interaction between trait model and fossil sampling density. For continuous traits, probabilistic approaches were most accurate, but were sensitive to model misspecification under low fossil sampling density. For discrete traits, pre-ordination methods were most reliable and probabilistic approaches outperformed point estimates under low sampling, while point estimates became increasingly accurate as sampling density increased. Fossil sampling density was a stronger predictor of disparity accuracy than estimation method choice, underscoring that methodologies are only as powerful as the data provided. Our findings offer a practical decision framework for selecting the most appropriate workflow given the sampling density and trait characteristics of a dataset.
Takazawa, Y.; Takeda, A.; Hayamizu, M.; Gascuel, O.
Show abstract
Phylogenetic analyses often require the summarization of multiple trees, e.g., in Bayesian analyses to obtain the centroid of the posterior distribution of trees, or to determine the consensus of a set of bootstrap trees. The majority-rule consensus tree is the most commonly used. It is easy to compute and minimizes the sum of Robinson-Foulds (RF) distances to the input trees. In mathematical terms, the majority-rule consensus tree is the median of the input trees with respect to the RF distance. However, due to the coarse nature of RF distance, which only considers whether two branches induce exactly the same bipartition of the taxa or not, highly unresolved trees can be produced when the phylogenetic signal is low. To overcome this limitation, we propose using median trees with respect to finer-grained dissimilarity measures between trees. These measures include a quartet distance between tree topologies, and transfer distances, which quantify the similarity between bipartitions, in contrast to the 0/1 view of RF. We describe fast heuristic consensus algorithms for transfer-based tree dissimilarities, capable of efficiently processing trees with thousands of taxa. Through evaluations on simulated datasets in both Bayesian and bootstrapping maximum-likelihood frameworks, our results show that our methods improve consensus tree resolution in scenarios with low to moderate phylogenetic signal, while providing better or comparable dissimilarities to the true phylogeny. Applying our methods to Mammal phylogeny and a large HIV dataset of over nine thousand taxa confirms the improvement with real data. These results demonstrate the usefulness of our new consensus tree methods for analyzing the large datasets that are available today. Our software, PhyloCRISP, is available from https://github.com/yukiregista/PhyloCRISP.
Reinales, S.; Forest, F.; Zuntini, A.; Cardoso, D.; Ballen, G. A.; Cardenas, D.; Pirani, J. R.
Show abstract
Obtaining large and well-resolved phylogenetic trees for neotropical clades is challenging, as many species inhabit remote regions, and sampling often relies on herbarium specimens with highly degraded DNA. Target capture provides an effective solution for retrieving molecular data from fragmentary material. However, data processing using tools generally designed for diploid organisms and single-copy loci is also challenging, particularly when events such as genome duplication and hybridisation have shaped the lineage evolution. We used dual-hybridisation to integrate Ochnaceae-specific and universal probes to reconstruct the phylogenetic relationships of Sauvagesieae, a pantropical clade with ca. 90 species mainly distributed in the northern Andes, the Brazilian Espinhaco Range, and the Amazon-Guyana region. We tested different filtering strategies involving missing data and paralogs to assess probable sources of tree discordance and topological uncertainty. We found no significant benefit in reducing tree discordance after removing entire genes due to the presence of paralogs or a high amount of missing data. Removing fragmentary sequences instead improved alignments and increased branch support of gene trees. By quantifying the proportion of SNPs, analysing the distribution of the allele frequencies, and gene-tree quartet frequencies, we found evidence of polyploidisation and hybridisation, which could reduce resolution at internal nodes, particularly in mountain clades. Our results underscore the importance of exploring the complexities of target-capture data, not only to improve phylogenetic resolution but also to understand the sources of phylogenetic conflict and the underlying molecular evolutionary processes.
Marchand, B.; Tahiri, N.; Tremblay-Savard, O.; Lafond, M.
Show abstract
Phylogenetic networks are widespread representations of evolutionary histories for taxa that undergo hybridization or Lateral-Gene Transfer (LGT) events. There are now many tools to reconstruct such networks, but no clearly established metric to compare them. Such metrics are needed, for example, to evaluate predictions against a simulated ground truth. Despite years of effort in developing metrics, known dissimilarity measures either do not distinguish all pairs of different networks, or are extremely difficult to compute. Since it appears challenging, if not impossible, to create the ideal metric for all classes of networks, it may be relevant to design them for specialized applications. In this article, we introduce a metric on LGT networks, which consist of trees with additional arcs that represent lateral gene transfer events. Our metric is based on edit operations, namely the addition/removal of transfer arcs, and the contraction/expansion of arcs of the base tree, allowing it to connect the space of all LGT networks. We show that it is linear-time computable if the order of transfers along a branch is unconstrained but NP-hard otherwise, in which case we provide a fixed-parameter tractable (FPT) algorithm in the level. We implemented our algorithms and demonstrate their applicability on three numerical experiments. Full online versionhttps://www.biorxiv.org/content/10.1101/2025.11.20.689557
Ivan, J.; Lanfear, R.
Show abstract
AO_SCPLOWBSTRACTC_SCPLOWMany phylogenomic studies used non-overlapping windows to address gene tree discordance across a set of aligned genomes. Recently, Ivan et al. (2025) proposed an information theoretic approach to choose an optimal window size given the alignment. However, this approach selects only a single fixed window size per chromosome, which is a useful first step but fails to account for variation in the size of non-recombining regions along each chromosome. Such variation is expected to occur due to the stochastic nature of recombination as well as the variation in recombination rates along chromosomes. In this study, we extend the approach of Ivan et al. (2025) to allow window sizes to vary across the chromosome, using a splitting-and-merging strategy that allows for each window to be of an arbitrary length. We showed that the new method outperformed the fixed-window approach in recovering gene tree topologies on a wide range of simulated datasets. Applying the new method on the genomes of seven Heliconius butterflies, we found that the average window sizes for the group ranged between 538-808bp, but with a very similar distribution of gene tree topologies compared to previous studies that used fixed window sizes. For the genomes of great apes, the average window sizes ranged from 4.2kb to 6.2kb, with the proportion of the major topology (i.e., grouping human and chimpanzee together) reaching approximately 80%. In conclusion, our study highlights the limitations of using a fixed window size when recombination rates vary across the chromosomes, and proposes a splitting-and-merging approach that allows for variable window sizes across whole genome alignments.
Berv, J. S.; Fox, N.; Thorstensen, M. J.; Lloyd-Laney, H.; Troyer, E. M.; Rivero-Vega, R. A.; Smith, S. A.; Friedman, M.; Fouhey, D. F.; Weeks, B. C.
Show abstract
O_LIHigh-dimensional comparative datasets, including geometric morphometric landmarks, functional traits, and other large trait datasets, are increasingly common in biology. When these datasets include a large number of traits relative to the number of taxa, they pose significant challenges for phylogenetic comparative analysis. In addition, evolutionary dynamics are often heterogeneous across phylogenies, challenging researchers to develop tools that can localize and account for such variation when investigating hypotheses of evolutionary change. C_LIO_LIWe present bifrost, an R package for detecting and characterizing shifts in multivariate trait evolution across phylogenetic trees. bifrost implements a stepwise greedy search over alternative macroevolutionary regime configurations on a phylogeny. Candidate shifts are proposed and assessed at internal nodes, accelerated with parallel model fitting where possible, and aggregated sequentially when they exceed a user-defined information-criterion acceptance threshold. C_LIO_LIThe underlying model is a scalar-rate multivariate Brownian motion process fit by generalized least squares using mvMORPH::mvgls [1]. Our framework also provides support estimates for individual shifts using information-criterion weights. C_LIO_LIWe illustrate the workflow using a fossil-tip-dated phylogeny and high-dimensional landmark data for early bony fish jaws (32,508 scalar coordinate values), and discuss tuning, outputs, and limitations. bifrost extends existing phylogenetic comparative frameworks for evolutionary analysis and provides a scalable pipeline for exploring the phylogenetic natural history of large multivariate datasets. C_LI
Miller, E.; Sanchez Reyes, L.; McTavish, E. J.
Show abstract
Birds are frequently used as a focal taxon for evolutionary and ecological studies. Thousands of papers a year are published using birds as study systems. Hundreds more are published clarifying the evolutionary relationships among clades or regionally circumscribed sets of bird species. Up to date phylogenies are essential for informing and guiding avian studies, controlling for the expectation of shared trait evolution, and for science communication, among other applications. However, employing up to date phylogenies to address these questions has proven challenging for a number of reasons. First, individually published phylogenies are often hard to access in a usable manner. For example, sequences are usually made available, and an image of the phylogeny is published, but the actual phylogeny data product is often not digitally available. Second, published phylogenies often do not include all taxa of interest or have branch lengths in units of time. Third, taxonomic mismatches between phylogenies and existing datasets can complicate analyses. We address these issues by sharing an R package, clootl, that wraps together a new, complete, dated bird phylogeny with easy to use tools to extract trees for taxa of interest and sample over uncertainty. The phylogeny incorporates information from more than 300 individually published bird phylogenies. The R package includes tools to help appropriately cite these input studies. This software will enable users to smoothly integrate accurate evolutionary information into any analyses on birds.
Montoya, P.; Fabre, A.-C.; Goswami, A.; Morlon, H.; Clavel, J.
Show abstract
Multivariate phylogenetic comparative methods for modelling high-dimensional traits such as 3D shapes or gene expression proBiles have been recently developed. However, these approaches are impractical and almost impossible to use when the number of traits exceeds a few thousands, as they become computationally prohibitive. We overcome these limitations by proposing a new maximum likelihood approach based on the Empirical Bayes framework. This approach takes into account the information of the complete covariances (among species and traits) to infer parameters and compare models of trait evolution for high-dimensional datasets. Through simulations, we demonstrate that the proposed approach can accurately estimate parameters of various trait evolution models, even when the number of traits is ten times larger than the number of lineages; it requires less memory and is at least 10 times faster than currently available approaches. This fast, efBicient framework enabled us to extend the high-dimensional multivariate phylogenetic comparative toolkit by including an Ornstein-Uhlenbeck process with multiple optima to study adaptation to various selective regimes. Applying our approach to the evolution of jaw morphology in relation to dietary adaptation in mammals, we demonstrate morphological convergence in carnivorous and herbivorous lineages. The proposed Empirical Bayes framework, implemented in the R package mvMORPH, enables phylogenetic comparative methods to efBiciently handle high-dimensional datasets and complex models of trait evolution.